.TH pgreplay 1 "" "Jun 2011" "PostgreSQL Utilities"
.SH NAME
pgreplay \- PostgreSQL log file replayer for performance tests
.SH SYNOPSIS
\fBpgreplay\fP [\fIparse options\fR] [\fIreplay options\fR]
[\fB-d\fR \fIlevel\fR] [\fIinfile\fR]
.br
\fBpgreplay\fP \fB-f\fP [\fIparse options\fR] [\fB-o\fP \fIoutfile\fR]
[\fB-d\fR \fIlevel\fR] [\fIinfile\fR]
.br
\fBpgreplay\fP \fB-r\fP [\fIreplay options\fR] [\fB-d\fR \fIlevel\fR]
[\fIinfile\fR]
.SH DESCRIPTION
\fBpgreplay\fR reads a PostgreSQL log file (\fInot\fR a WAL file), extracts
the SQL statements and executes them in the same order and relative time
against a PostgreSQL database cluster.
A final report gives you a useful statistical analysis of your workload
and its execution.
.P
In the first form, the log file \fIinfile\fR is replayed at the time it is
read.
.P
With the \fB-f\fR option, \fBpgreplay\fR will not execute the statements, but
write them to a \(oqreplay file\(cq \fIoutfile\fR that can be replayed with
the third form.
.P
With the \fB-r\fP option, \fBpgreplay\fR will execute the statements in the
replay file \fIinfile\fR that was created by the second form.
.P
If the execution of statements gets behind schedule, warning messages
are issued that indicate that the server cannot handle the load in a
timely fashion.
The idea is to replay a real-world database workload as exactly as possible.
.P
To create a log file that can be parsed by \fBpgreplay\fR, you need to set the
following parameters in \fBpostgresql.conf\fR:
.IP
\fBlog_min_messages=error\fR (or more)
.br
\fBlog_min_error_statement=log\fR (or more)
.br
\fBlog_connections=on\fR
.br
\fBlog_disconnections=on\fR
.br
\fBlog_line_prefix=\(aq%m|%u|%d|%c|\(aq\fR (if you don\(aqt use CSV logging)
.br
\fBlog_statement=\(aqall\(aq\fR
.br
\fBlc_messages\fR must be set to English (encoding does not matter)
.br
\fBbytea_output=escape\fR
(from version 9.0 on, only if you want to replay the log on 8.4 or earlier)
.P
The database cluster against which you replay the SQL statements must be
a clone of the database cluster that generated the logs from the time
\fIimmediately before\fR the logs were generated.
.P
\fBpgreplay\fR is useful for performance tests, particularly in the following
situations:
.TP 4
*
You want to compare the performance of your PostgreSQL application
on different hardware or different operating systems.
.TP 4
*
You want to upgrade your database and want to make sure that the new
database version does not suffer from performance regressions that
affect you.
.P
Moreover, \fBpgreplay\fR can give you some feeling as to how your application
\fImight\fR scale by allowing you to try to replay the workload at a higher
speed.  Be warned, though, that 500 users working at double speed is not really
the same as 1000 users working at normal speed.
.SH OPTIONS
.SS Parse options:
.TP
\fB-c\fR
Specifies that the log file is in \(aqcsvlog\(aq format (highly recommended)
and not in \(aqstderr\(aq format.
.TP
\fB-b\fR \fItimestamp\fR
Only log entries greater or equal to that timestamp will be parsed.
The format is \fBYYYY-MM-DD HH:MM:SS.FFF\fR like in the log file.
An optional time zone part will be ignored.
.TP
\fB-e\fR \fItimestamp\fR
Only log entries less or equal to that timestamp will be parsed.
The format is \fBYYYY-MM-DD HH:MM:SS.FFF\fR like in the log file.
An optional time zone part will be ignored.
.TP
\fB-q\fR
Specifies that a backslash in a simple string literal will escape
the following single quote.
This depends on configuration options like
\fBstandard_conforming_strings\fR and is the default for server
version 9.0 and less.
.TP
\fB-D\fR \fIdatabase\fR
Only log entries related to the specified database will be parsed
(this option can be specified multiple times for more than one database).
.TP
\fB-U\fR \fIusername\fR
Only log entries related to the specified username will be parsed
(this option can be specified multiple times for more than one user).
.SS Replay options:
.TP
\fB-h\fR \fIhostname\fR
Host name where the target database cluster is running (or directory where
the UNIX socket can be found). Defaults to local connections.
.br
This works just like the \fB-h\fR option of \fBpsql\fR.
.TP
\fB-p\fR \fIport\fR
TCP port where the target database cluster can be reached.
.TP
\fB-W\fR \fIpassword\fR
By default, \fBpgreplay\fR assumes that the target database cluster
is configured for \fItrust\fR authentication. With the \fB-W\fR option
you can specify a password that will be used for all users in the cluster.
.TP
\fB-s\fR \fIfactor\fR
Speed factor for replay, by default 1. This can be any valid positive
floating point number. A \fIfactor\fR less than 1 will replay the workload
in \(oqslow motion\(cq, while a \fIfactor\fR greater than 1 means
\(oqfast forward\(cq.
.TP
\fB-E\fR \fIencoding\fR
Specifies the encoding of the log file, which will be used as client
encoding during replay. If it is omitted, your default client encoding will
be used.
.TP
\fB-j\fR
If all connections are idle, jump ahead to the next request instead of
sleeping. This will speed up replay. Execution delays will still be reported
correctly, but replay statistics will not contain the idle time.
.TP
\fB-n\fR
Dry run mode. No connections to the server are made.
Useful for checking if the replay file is corrupt or to get statistics
about the replay file (number of statements, original duration, ...)
.TP
\fB-X\fR \fIoptions\fR
Extra connection options for replay connections. These must be libpq
connection options specified in the format \(oqoption=value [...]\(cq.
.SS Output options:
.TP
\fB-o\fP \fIoutfile\fR
specifies the replay file where the statements will be written
for later replay.
.SS Debug options:
.TP
\fB-d\fR \fIlevel\fR
Specifies the trace level (between 1 and 3). Increasing levels will produce
more detailed information about what \fBpgreplay\fR is doing.
.TP
\fB-v\fR
Prints the program version and exits.
.SH ENVIRONMENT
.TP
\fBPGHOST\fR
Specifies the default value for the \fB-h\fR option.
.TP
\fBPGPORT\fR
Specifies the default value for the \fB-p\fR option.
.TP
\fBPGCLIENTENCODING\fR
Specifies the default value for the \fB-E\fR option.
.SH LIMITATIONS
\fBpgreplay\fR can only replay what is logged by PostgreSQL.
This leads to some limitations:
.TP 4
*
\fBCOPY\fR statements will not be replayed, because the copy data are not
logged.
.TP 4
*
Fast-path API function calls are not logged and will not be replayed.
Unfortunately, this includes the Large Object API.
.TP 4
*
Since the log file is always in the server encoding (which you can specify
with the \fB-E\fR switch of \fBpgreplay\fR), all
\fBSET client_encoding\fR statements will be ignored.
.TP 4
*
Since the preparation time of prepared statements is not logged (unless
\fBlog_min_messages\fR is \fBdebug2\fR or more), these statements will be
prepared immediately before they are first executed during replay.
.TP 4
*
Because the log file contains only text, query parameters and return values
will always be in text and never in binary format. If you use binary mode to,
say, transfer large binary data, \fBpgreplay\fR can cause significantly more
network traffic than the original run.
.TP 4
*
Sometimes, if a connection takes longer to complete, the session ID
unexpectedly changes in the PostgreSQL log file. This causes \fBpgreplay\fR
to treat the session as two different ones, resulting in an additional
connection. This is arguably a bug in PostgreSQL.
.SH AUTHOR
Written by Laurenz Albe \fB<laurenz.albe@cybertec.at>\fR.
